AD-A102  570 

UNCLASSIFIED 


CALIFORNIA  UNIV  BERKELEY  CENTER  FOR  PURE  AND  APPLIE— ETC 
METHODS  FOR  SCALING  TO  DOUBLY  STOCHASTIC  FORM* tU) 

JUN  81  B  N  PARLETTr  T  L  LANDIS 
PAM-44 


DIK  FILE  COPY  AD  A I  u  2  5  7 


CENTER  FOR  PURE  AND  APPLIED  MATHEMATICS 

UNIVERSITY  OF  CALIFORNIA,  BERKELEY  ^ ; S 


PAM-  W 


METHODS  FOR  SCALING  TO  DOUBLY  STOCHASTIC  FORM 


B.  N.  Parlett  and  T.  L.  Landis 


Legal  Notice 

This  report  was  prepared  as  an  account  of  work  sponsored  by 
the  Center  for  Pure  and  Applied  Mathematics.  Neither  the 
Center  nor  the  Department  of  Mathematics,  makes  any  warranty 
expressed  or  Implied,  or  assumes  any  legal  liability  or 
responsibility  for  the  accuracy,  completeness  or  usefulness 
of  any  Information  or  process  disclosed. 


II  )■'  :  -  •  P/tJhif 


Methods  for  Scaling  to  Doubly  Stochastic  Forrnj 


B.  N.J^arlett 1  and  T  L^-andis* 


/:  i '/ ' 


ABSTRACT 


New  methods  for  scaling  square,  nonnegative  matrices  to  dou¬ 
bly  stochastic  form  are  described.  A  generalized  version  of  the 
convergence  theorem  in  31NKH0RX  and  KNOPP  [1967]  is  proved 
and  applied  to  show  convergence  for  these  new  methods.  Tests 
indicate  that  one  of  the  new  methods  has  significantly  better  aver¬ 
age  and  worst-case  behavior  than  the  Sinkhorn/Knopp  method;  for 
one  of  the  3x3  examples  in  MARSPIALL  and  GLKj.N  [1968],  SK 
requires  130  times  as  many  operations  as  the  new  algorithm  to 
achieve  row  and  column  sums  1  -  10~5. 


1.  Introduction. 

We  seek  an  algorithm  which  will  find  a  pair  of  positive  diagonal  matrices  .0 
and  E  for  a  given  square  nonnegative  matrix,  A.  such  that  DAE  is  doubly 
stochastic— or  determine  that  such  a  pair  does  not  exist. 

A  nonnegative  n  xn  matrix  A  is  said  to  have  support  if  it  possesses  a  posi¬ 
tive  diagonal;  A  has  total  support  if  A  *  0  and  every  positive  entry  in  A  lies  on  a 
positive  diagonal.  A  is  fully  indecomposable  if  it  is  impossible  to  find  permuta¬ 
tion  matrices  P  and  Q  so  that: 


with  Ai  square. 
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Three  procedures  for  computing  D  and  E.  when  they  exist,  have  appeared 
in  the  literature:  (l)  minimize  xl  Ay  subject  to  the  constraints 

n  =  n  Vi.  =  i-  (2)  minimize 


/(*!•  '  J»)  = 


n 

fc  =  l  t=l 
n 

n  ** 

k=l 


subject  to  the  constraints 


xk  >  0  k  -  1,  •  •  ■  ,n  and  ^  x*  =  1 


and  (3)  compute  D  and  E  iteratively  by  alternately  normalizing  all  rows  and  all 
columns  in  A.  The  first  method  is  due  to  MARSHALL  and  0LK1N  [1963];  the 
second  is  described  in  DJOKVIC  [1970].  In  each  case,  the  minimization  problem 
is  shown  to  have  a  solution  when  A  is  fully  indecomposable.  The  third  algorithm 
was  first  described  by  DEMING  and  STEPHAN  [1940]  who  called  it  the  "Iterative 
Proportional  Fitting  Procedure".  It  was  rediscovered  by  SINKHORN 
[1962,1964,1966,1967],  and,  SINKHORN  and  KN0Pn  [1967]  proved  that  D  and  E 
exist  so  that  DAE  is  doubly  stochastic  if  and  only  if  A  possesses  total  support. 
Further,  they  showed  that  in  such  a  case,  the  iteration  converges  to  a  solution 
pair  D  and  E-  BRUALDI,  PARTER,  and  SCHNEIDER  [1966]  independently  proved 
the  existence  of  D  and  E  when  A  is  a  direct  sum  of  fully  indecomposable 
matrices  by  showing  that  its  corresponding  Menon  operator  (MENON  [1967])  has 
a  fixed  point.  Finally,  SINKHORN  [1966]  showed  that  the  Sinkhorn/Knopp 
method  converges  geometrically  for  positive  starting  matrices. 

The  following  result  can  be  applied  to  show  that  a  nonnegative  matrix  has 
total  support  if  and  only  if  it  is  a  direct  sum  of  fully  indecomposable  matrices: 


The  Erobenius  -Konig  Theorem  (MARCUS  and  MINC  [l964],p  97) 

A  nonnegative  n  xn  matrix  without  support  contains  an  s  xt  zero  subma¬ 
trix  where: 


s  +  t  =  n  +  1 
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In  this  paper  we  describe  three  new  iterative  procedures  for  scaling  nonne¬ 
gative  matrices  to  doubly  stochastic  form.  We  prove  a  generalized  version  of  the 
convergence  theorem  in  SIXKKORN  and  KXQPP  [1967]  and  apply  it  to  show  that 
for  starting  matrices  with  total  support,  these  new  iterations  converge  to  diago¬ 
nally  equivalent  limits  which  are  multiples  of  doubly  stochastic  matrices.  In  the 
final— and  most  interesting— section,  we  present  results  of  tests  comparing  our 
new  methods  to  the  Sinkhorn/Knopp  method  (SK).  One  of  the  new  algorthms, 
EQ,  exhibited  significantly  better  average  and  worst-case  behavior  than  SK:  for 
some  test  matrices,  SK  required  130  times  as  maan  operations  as  EQ  (where  an 
operation  is  a  multiply  or  a  divide)  and  examples  for  which  EQ  requires  more 
than  ten  times  as  many  operations  as  SK  are  rare. 

Techniques  for  seeding  to  doubly  stochastic  form  have  a  number  of  applica¬ 
tions.  The  problem  that  launched  Sinkhorn’s  research  was  estimating  the  tran¬ 
sition  matrix  in  a  Markov  chain.  MARSHALL  and  0LK1N  [1979]  contains  refer¬ 
ences  for  other  statistical  applications.  They  can  be  applied  to  equillibrate  a 
general  matrix  with  respect  to  any  p-norm,  p  ?  =»;  one  of  us  has  used  EQ  to  test 
for  diagonal  equivalence  to  orthogonal  form  by  equillibrating  with  respect  to  the 
2-norm.  Finally,  we  remark  that  doubly  stochastic  metr^es  oossess  the  follow¬ 
ing  interesting  properties:  (l)  They  are  "perfectly  baluncod"  with  respect  to 
the  1-norm  (  see  PARLETT  and  REINSCK  [1962]);  (2)  Their  p-norms  are  unity  for 
allp<=*>  (see  STOER  and  WITZGALL  [1962]);  and  (3)  Their  inverses— if  they  exist— 
have  row  and  column  sums  equal  to  unity  (though  the  inverse  of  a  doubly  sto¬ 
chastic  matrix  is  doubly  stochastic  only  for  permutation  matrices). 


f 


£oc(';13  |  on  Tor 

.’IV!  5  r.p Ail 
bVM  7 

If'v'mioiir.red  1  [q 
J-O..A  li‘  l cat  l  o nit _ 

By_ . .  . . 

lMr.trihut  ion/ 

Aval in V II Ity  Cones 

Avail  and/or  j 


-4- 

2.  Algorithms 

In  this  section,  we  will  describe  three  iterative  procedures  for  scaling  non¬ 
negative  matrices  to  doubly  stochastic  form.  In  section  three  we  show  that 
when  they  are  applied  to  a  matrix  with  total  support,  the  result  is  a  sequence  of 
iteration  matrices  converging  to  a  multiple  of  a  doubly  stochastic  matrix. 

Our  first  algorithm,  DEV,  was  motivated  by  the  desire  to  have  an  algorithm 
that  would  modify  a  single  row  or  column— leaving  the  remainder  of  the  matrix 
unchanged— at  each  iterative  step.  There  is  a  natural  way  to  select  the  row  or 
column  to  be  changed:  choose  one  whose  sum  deviates  maximally  from  the 
mean  of  the  row  sums  (which  is  also  the  mean  of  the  column  sums).  This 
approach  is  reasonable,  because  matrices  with  equal  row  and  column  sums  are 
scalar  multiples  of  doubly  stochastic  matrices.  For  the  same  reason,  the 
natural  change  is  to  multiply  entries  in  the  selected  row  or  column  by  a  factor 
chosen  so  that  its  new  sum  will  be  the  new  mean  of  row  sums. 

ALgorithm  1  (called  DEV,  for  deviation  reduction) 

Given  A  =  <4^01  an  n  x  n  matrix: 

(1)  Compute  row  and  column  sums  for  A: 

n  «-  2  ay  i  =  1 . n 

i 

c>  *-  £  “ij  i  ~  1 . n 

t=i 

Compute  the  mean,  p,  of  row  sums  in  A: 


(2)  Find  indices  p  and  q  so  that 

I  rp  -  u  [  =  max  '  rx  -  p\ 

and 

‘.c„~p  =  max  c.  -  p. 

y  i 

If  j  rp  -  yu  <  tol  p,  and  c7  -  p.  <tol  p,  go  to  step  5. 
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If  .  c,  —  /x  >  rp  -  p  go  to  step  4. 

(3)  Calculate  the  mean  p,  of  row  sums  other  than  rp  : 

i  f  a  ) 


1  A 
,*> rp 


Scale  row  p  to  p: 


°w  -  “rf  ~ 


•&—  j  =  1,  •  •  •  ,n 


Update  row  and  column  sums  for  A: 


rp*-P 


j  *■  c,  +  ~  ll  “p;  J  =  !• 

•  n 


'  '  I’-p  J 

Go  to  step  2. 

(4)  Calculate  the  mean,  p,  of  column  sums  other  than  c?: 


n  - 1  I 


fe='. 


Scale  column  q  to  p: 


dig  «-  0(5 


•  i  =  l,  •  •  ■  ,n 


Update  row  and  column  sums  for  4: 


c?  “  P 


rt  *-  r,  +  - 1 

c„ 


Go  to  step  2. 
(5)  Normalize: 


“tj  -  —«*y  i’j  ~  !•  ’  ’ 

r 


Exit. 
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Remarks 

Note  that  step  3  is  equivalent  to  premultiplying  matrix  A  by  a  positive  diag¬ 
onal  matrix: 

D  -  ding  (a!,.  •  •  •  .  d*) 


where 


di  = 


1  if  i  A  p 
U  if  i  =  p 


and  step  4  is  equivalent  to  post  multiplying  matrix  A  by  a  positive  diagonal 
matrix: 


E  =  dingie^  ,e„) 


where 


if  3  *  q 
if  j  =  9 


'tie  say  that  a  row  and  column  pair  in  a  nonnegative  matrix  is  balanced 
(with  respect  to  the  1-norm)  if  they  have  equal  sums.  Obviously,  all  row  and 
column  sums  in  a  multiple  of  a  doubly  stochastic  matrix  are  balanced.  A  second 
approach  to  scaling  to  doubly  stochastic  form,  then,  is  to  find  a  row  and  a 
column  whose  sums  have  maximal  difference  and  to  scale  the  matrix  so  that 
their  sums  are  equal.  This  is  the  approach  taken  by  our  second  algorithm. 


Algorithm  2  (called  BAL,  for  balance) 

Given  A  =  A'0)  an  nxn  nonnegative  matrix: 

(l)  Compute  row  and  column  sums  for  A: 

rt  -  £  °v  i  =  1,  ,n 

j=* 

n 

Cj  •-  V  j  =  1.  ,n 

t  =  l 


and  the  mean  of  row  sums: 


P 


_1_ 

n 


2  n 

it=i 


M*. 


lilfiMMiUMaHltii  ii  ill  •  1 ' 


(2)  Find  indices  p  and  q  so  that: 


I  rp  —  c„  i  =  max  rx  -  c, 

i.j 


If  i  T-p  —  cq  <  fj.  taL  go  to  step  5 

(3)  Balance  row  p  and  column  q: 
Multiply  entries  in  row  p  by: 


_  ci  CL?<< 
rp  ~apq 


and  multiply  entries  in  column  q  by  / 

(4)  Update  row  and  column  sums: 


n  *-  rx+  (f  1  -  1)  CLU,  1  =  1.  •  •  •  ,n 


rp  ](rj>  “pqM  +  ‘hi 


Go  to  step  2. 

(5)  Normalize: 


Cj  +  (/-‘0  Op;  j  =  1. 

M  h  t  rx 


*-  ^  =  1*  ‘  ‘  '  -n 
H' 


Note  that  step  3  is  equivalent  to  forming  the  product  DAE  where: 

D  =  diag(dx,  •  .O 

eL  =  ]• 

/  ,  if  i  =  p 

E  =  diag(eu  •  .O 

8j  =  1,  if  j  *  q, 


f 
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Now  for  the  third  method.  When  testing  DEV  we  found  cases  where  a 
sequence  of  10  or  more  iterations  were  alternately  seeding  the  same  row  and  the 
same  column.  Our  third  algorithm  is  a  variant  of  DEV  that  avoids  this  problem. 
It  records  the  last  row  and  last  column  scaled;  when  it  detects  a  repeat,  it  per¬ 
forms  a  balancing  step. 


Algorithm  3  (called  EQ,  for  equalize) 

Given  A  -  4'°^  an  nxn  nonnegative  matrix: 

(l)  Initialize: 


lastr  «-  0 

fastc  <-  0 

2 
;  =  1 

,n 

£  3  =  !•  ' 

t=l 

,71 

M  -  (f*  r, 

M  n  /=*,  1 

(2)  Find  indices  p  and  q  so  that: 


!  rp  -  p,  =  max  rt  —  p 


|c,  -M1 


max  !  Cj  —  /j., 


If  \Tp  —  jui  <  fj.  tol  and  c7  -  /x\  <  fj, tol  go  to  step  6. 

If  .  rp  ~  M  i  <  cq  —  M  i  go  to  step  4. 

(3)  If  p  =  fasfr  go  to  step  5.  Otherwise,  calculate  the  mean,  Jx ,  of  row  sums 
other  than  rp: 


n  -1 


n 

v  i 

|i=‘ 


and  scale  row  p  to  p: 


•4 
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o*j  *-  afcj  /  j  =  1.  -n 

Multiply  entries  in  column  1  by  / 

Qii  f  ~x  i  =  1.  ,n 

Update  row  and  column  sums: 

ri  *-  T\  +  (/_1  -  l)ou  i  =  1. 
Update  row  and  column  sums: 


ri  «-  ri  +  (/  '  ~  1)  <hi  i  =  1. 


*-  fa  “  aw)(C(  -aw)  +  a*, 


Go  to  2. 

(6)  Normalize: 


Exit. 


Notation 

To  simplify  the  descriptions  of  the  algorithms  we  have  omitted  program¬ 
ming  details.  In  particular,  we  have  assumed  that  all  scaling  and  balancing 
operations  are  carried  out  explicitly  by  modifying  entries  in  matrix  A .  In  the 
next  section,  it  will  be  convenient  to  assume  that  the  iterations  are  carried  out 
implicitly  by  changing  entries  in  a  pair  of  diagonal  matrices  D  and  E. 

Each  algorithm  produces  a  sequence  of  iteration  matrices  which  are  diago¬ 
nally  equivalent  to  the  starting  matrix  ,4  =  <4'c': 

(2.1)  =  D^A  £■(*>  k  =  1.2, 


=  diag(d{kK  ■ 
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E'k)  -  diag(e'k\ 

and  we  set  D =  E'c^  -  I. 

We  introduce  the  following  notation: 


A™  =  (<-*’! 


(So,  for  example,  a^k)  =  ej'k)) 


r.(fc)  =  2  a,)*’ 
;  =  1 


c/k)  =  V  a^> 


and 


Mk  =  — 

71 


E  r^> 

i  =  l 
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3.  Convergence 

In  this  section,  we  will  prove  that  when  a  starting  matrix  has  total  support 
each  of  the  algorithms  described  m  section  2  produces  a  sequence  of  iteration 
matrices  which  converges  to  a  diagonally  equivalent,  doubly  stochastic  limit. 

S1NKH0RN  and  KNOPP  [1967]  showed  that  when  SK  is  applied  to  an  nxn 
nonnegative  starting  matrix  /l'01  =  A  possessing  nonzero  row  and  column  sums, 
the  result  is  a  sequence  of  iteration  matrices  as  in  (2.1)  with  the  following  pro¬ 
perties: 

(Pi)  The  sequence  (  sfc  )fc  =  l2.  is  monotonically  increasing  where: 

(3.1)  s*  =  n  dtkWk)  k  =  1.2, 

i=l 

(P2)  If  lim  — — =  1  then  for  i,7  =  1,  ,n: 

lim  =  1 

ii 

lim  c/*’  =  1 

it—  1 

g 

lim  3  =  1 

tc  —  ey ' 

(P3)  With  the  k 4/1  mean  of  row  sums,  defined  by  (2.3) 

file  -  1  k  =  1,2, 


Algorithms  which,  given  an  nxn  nonnegative  starting  matrix  ,  A,  produce  a 
sequence  of  iteration  matrices  as  in  (2.1)  satsifying  (PI),  (P2),  and  (P3),  will  be 
called  "diagonal  product  increasing"  (DPI).  The  following  result  is  a  simple  gen¬ 
eralization  of  the  convergence  theorem  in  SINKKORN  and  KNOPP  [1967]. 


*  _ _ 
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Theorem  1 

Given  a  sequence  (2.1)  of  diagonal  equivalents  for  A  satisfying  (PI),  (P2).  and 

(P3): 

[1]  If  A  has  support  then  lim  exists  and  is  doubly  stochastic. 

k  — 

[2]  If  /I  has  total  support  then  the  limit  in  [l)  is  diagonally  equivalent  to  A. 
Before  proving  the  theorem  we  state  and  prove  a  corollary: 

Corollary 

[1]  If  A  is  diagonally  equivalent  to  a  doubly  stochastic  matrix,  S,  then: 

5  =  lim 

k 

[2]  If  A  has  support  and  is  not  diagonally  equivalent  to  a  doubly  stochastic 
matrix,  then  for  each  pair  of  indices  (i,j)  such  that  cq;  does  not 

lie  on  a  positive  diagonal: 

lim  a^)k)  =  0 

Proof  of  Corollary: 

(1)  By  BirkhotT’s  theorem  (BIKKHGl'F  [  1946])  the  set.  of  n  x  n  doubly  stochastic 
matrices  is  the  convex  hull  of  the  set  of  n  x  n  permutation  matrices. 
Therefore,  S  and  its  diagonal  equivalent,  A,  have  total  support.  Now  the 
theorem  implies  that  lim  A ^  is  doubly  stochastic  and  diagonally  equivalent 

lc 

to  A ■  Since  doubly  stochastic  equivalents  are  unique,  (see  SIXKKORX  and 
KNOPP  [1967]): 

lim  /l  1*1  =  5 

*  -- 

(2)  By  the  theorem,  lim  A'k^  is  doublv  stochastic,  so  it  has  total  support,  and 

k  -»» 

lim  a*)*'  =  0  whenever  ay  does  not  lie  on  a  positive  diagonal.  ■ 

Note  that  matrices  without  support  are  not  covered  by  the  preceding 
theorem  or  its  corollary.  Such  matrices  are  always  singular,  and  KAhAN  [private 
communication]  has  shown  that  the  sequence  of  iteration  matrices  (  A'*'  )  pro¬ 
duced  by  SK  cycles  for  such  a  starting  matrix. 


Proof  of  Theorem  1: 


We  will  need  the  following  well  known  result: 
Lemma  1  ( The  Arithmetic  /  Geometric  Mean  Inequality ) 

If  xx  &  0  for  i  =  1.  ,n  then: 


ft 


I 

«  =  I 


with  equality  only  when  =  xz  =  •  ■  =  x„ 

(l):  (PI)  implies  (sfc)fc  =  12  is  monotonically  increasing.  Since  A  has  support, 
a  permutation,  <j,  of  {1,  ,n  j  exists  such  that: 


f  —  1,  ,71 


is  a  positive  diagonal  in  A.  l,e;  a  =  min  ( ).  Then 


£  dp'-eft,  a  <,  £  a,  0(j)  =  s  =  n 

i=l  i=l  i=l  i=l 


(Property  (P3)  is  used  for  the  right  hand  equality)  By  the  arithmetic,  geometric 
inequality: 


1  =  1 

and  is  bounded.  Therefore  by  (Pi) 


hm  sk  =  L  >  0 


exists,  and 


sfc 

lim  — —  =  1 


It—  Si, 


By  (P2): 


’  1*  H ) 


.ifcM) 


lim  — ttt —  =  1  and  lim  —  . —  =  1 
t-~  d^k)  ey*> 


By  (P3),  since  the  A'*1  are  nonnegative,  no  entry  a^c)  can  be  larger  ihan  n. 
Therefore,  for  each  index  pair  (i,j)  the  sequence  ( a^*')  is  Cauchy,  and 
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lim  A{k)  =  i4w 

Je  ♦» 

exists.  Since  the  row  and  column  sums  in  .4'”^  must  be: 

limr/**  -  i  =  1.  ,n 

k  — 

lim  c yfc)  =  cy“'  j  =  1,  ,n 

(P2)  implies  that  is  doubly  stochastic. 

(2):  To  prove  the  second  half  of  the  theorem  we  will  need  the  following  lemma 
which  is  paraphrased  from  51NKK0RX  and  KNOPP  [1967],  p.  345. 

Lemma  2 

If  A  is  a  nonnegative  matrix  with  total  support,  and  (y^)  are  positive 

sequences  fori  =  1,  •  •  •  ,n  and  j  =1,  ■  ,n  and: 

lim  =  Ui  >  0 

*  — 

for  each  index  pair  (i.j)  such  that.  /  0,  then  there  exist  positive 
sequences  (x/*')  and  (yy*')  with  positive  limits  such  that: 

it'*)  yjk)  =  xjk '  i/j * 1  for  all  i.j,  and  A: 

Now  for  the  proof.  From  part  (l),  we  know  that  lim  cla*'  =  lim  e/*5 
exists  for  any  i  and  j .  If  ay  ^  0  then  lim  dy*'  ej-k)  exists.  Using  (PI)  we  show  that 
this  limit  is  positive. 

If  c ty  /  0,  it  lies  on  a  positive  diagonal  in  A,  because  A  has  total  support.  Let 
a  be  a  permutation  of  \  1 . nj  such  that: 

ct  (i)  =j 

>  0  1  =  1.  .n 

By  (PI): 


(3.2) 


d,!*)  ey*'  &  s  , 


-1 


/c  =  1.2. 
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Let  a  =  min 


<h}  “ij  *  0 


Then: 


V 

1=1 


“S  I!  di  k)e^i )  at  <l)  <  Tr; 


1*) 


!=1 

!»t 


2 


i=l 


n—  1  n-1 

Now  apply  the  arithmetic/ geometric  mean  inequality: 


n 


i  =  l 
{*•1 


n  a 


n  —  1 


or 


(3.3) 

-1 

> 

jl«l 

I1-1 

Combining  (3.2)  and  (3.3): 

(3.4)  cd^  e/*5  >  s  ! 

(n  -  l] 

n 

which  shows  that  lim  di'~lc'1 

k  -•«» 

ejk  *  >  i 

jn-ll  a]"" 


n  '  i 


n-l 


>  0 


to  see  that  positive  sequences  [^fc)]  and  [e/*)]  with  positive  limits  exist  so  that : 

,£.v*Ogyfc)  -  r£ik)  e[k)  for  eacfr  and  £ 

Set 

=  diag  (  it'*1) 

and 

E''k)  ~  diag  (  el'fc)) 


lim  and 

*  — 

lim  E[k)  =  £'“* 

*  -- 


then 
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exist.  Taking  limits  on  both  sides  of 

D[k)A  E{k)  =  A^k) 

we  obtain 

ZJ(“M  £’(*,)  =  A(m)  ■ 

Naturally,  the  Sinkhorn/Knopp  method  is  product  increasing;  in  the  next 
theorem,  we  will  show  that  normalized  versions  of  DEV,  BAL,  and  EQ,  are  too. 
Here  is  another  example  of  a  DPI  algorithm  defined  for  irreducible,  nonnegative 
matrices: 


Algorithm 

At  each  step:  Normalize  the  rows  by  finding  Y,  a  positive  diagonal  matrix,  so 
that  Y  A^  has  row  sums  1.  Then  normalize  the  columns  by  a  diagonal  similarity 
transform  defined  as  follows: 

Let  x  =  (zlt  ■  •  •  ,  a:n)  be  a  left  Perron  vector  for  Y  A'^'- 

x  Y -lx 

and  let  X  -  diag(xv  ■  ■  ■  ,xn).  Then 

+  =  [x 

has  column  sums  1  because 

(1,  •  .  1  =  (1.  •  ,1) 

(Note  that  the  similarity  transform  leaves  diagonal  products  unchanged) 

Next  we  apply  Theorem  1  to  show  that  the  algorithms  described  in  section  2 
are  convergent  for  starting  matrices  with  total  support. 

Theorem  2 

Suppose  that  the  sequence  of  iteration  matrices 

=  D^AEW  k  =  1,  ■  ■  ,n 


results  from  the  application  of  DEV,  BAL,  or  EQ,  to  A  =  A!01;  then  if  A  has 


j(fc) 

total  support,  lim - 

*  At 


is  doubly  stochastic  and  diagonally  equivalent  to  A. 


Proof 


We  prove  Theorem  2  by  showing  that  the  sequence  of  normalized  iteration 
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By  the  arithmetic/geometric  mean  inequality: 


, .  n  -1 


<  i 


Mi 


Therefore: 


sk «-! 


Afc  +  l 


n  -l 


i 


The  above  arguement  can  be  repeated  for  a  column  scaling  at  step  (k+1),  and 
(Pi)  holds.  Next  define  sequences: 

(3.8)  (A(*)Jt  =  l2  i  =  U  ■  .n 

by 

T 

-2—  if  i=p  and  at  step  (k+1)  rowp  is  scaled 
to  the  mean  of  the  other  row  sums 


if  i=q  and  at  step  (k+1)  column  q  is  scaled 
to  the  mean  of  the  other  column  sums 


M-k 


otherwise 


By  (3.7),  —  ^  xi'k'>  =  1  f°r  each  k. 

n  t  =  i 

inequality  it  can  be  shown  that  from: 


Using  the  arithmetic/geometric  mean 


follows: 


lim  TT  =  lim 

k—  1 1  *-.» 


=  1 


hm  xjk'1  =  1  i  =  1,  ,n 


Since 


at**"  _  m* 

dck)  MfcM 


or 


^  Mfc  f  i  t p 


,71 
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r.(fcM) 


and  similarly 


for  some  L 


e.(*4-i) 

- T7T — =  1  or  -  1  =  1, 

elfc)  cq 


,n 


=  1  or 


for  some  L 


it  follows  that: 


0 

lim 

t~  d.(*) 


,  .{*♦!) 


=  lim  Vi.,  =  1  i  =  1, 

fc—  ei'*' 


At  each  step,  DEV  selects  p  or  q  so  that 


or  £2__! 
Mfc  Mfc 


is  maximal.  It  follows  that  fori, 7  =  1,  ,n: 


lim 


-  =  lim 


n;t> 

12 _ _ 


=  1 


fc—  Mfc  k—  Mfc 

Therefore  (P2)  holds  for  the  sequence  (3.5)  produced  by  DEV. 
BAL: 

Suppose  that  at  step  (k+1)  BAL  balances  row  p  and  column  q.  Let 

x  I  =  ri*  ^  —  fi— I 


'p  “p? 
-4 


,(*)  =  c~'*^  —  czJ£^ 


In  this  case: 

sfc  m 


(3.9) 


where 


I  /  MfcH 


Sfc  [  1  /  Mfc 


/  /"  = 


Mfc-u 


/  = 


.;*) 


1/2 


so  to  show  that  sk  M  &  sfc ,  we  must  show  that  /j.k  * ,  s;  . 
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-22  - 
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r(fc) 
lim  ^L_ 
fc  —  A* 


iim 

fc  — 


A* 


Equation  (3.4)  in  the  proof  of  Theorem  1  holds  whenever  (PI)  and  (P3)  are 
satisfied,  and  for  each  index  pair  (i.j)  such  that  Oy  *  0,  the  sequence 


4k) 


Afc 


k  =  1.2. 


is  bounded  away  from  zero.  Therefore,  the  sequences 


.(*) 


A fc 


and 


,(fc) 


At 


(3.17) 


,.  X'~'/Uk  X'*> 

tun  — - =  lim  ■  =  1 


lUil  JlT  -  W»U  /r.\ 

t-*®  yK*}  /  fji fc  y* 

Finally,  for  each  i.j,  and  k: 


K*  +  <> 


(fcM) 


_ _  A*  ^ _ 

Si(fc)  AfcM  ii(fc) 

p  ffc  +  l) 


_  Afc 

f^-’rC  4*  1 


or 


W 


At 
Afc  +  i 


,  ffcn) 


e,- 


r*)]54 


•  - ,  vte  1  ' 

i*  ; 


Therefore,  by  (3.15)  and  (3.17): 


(3.13) 


w.(fc  +  U 


..  V 
ap) 


.Ifc  +  0 


•=  lim 


,ffc) 


-=  1 


bounded  away  from  zero,  because  x'^  ?  0  and  y W  *  0  for  each  k.  We  have: 
-(fc) , 


for  each  i  and  j.  and  (P2)  is  satisfied. 

EQ: 

Each  step  of  EQ  is  a  step  of  DEV  or  a  balancing  step.  The  arguements  above 
for  DEV  and  BAL  show  that  for  each  fc,  s/e^1  &  sfc,  ie.  that  (Pi)  holds.  Consider 
the  sequence  (3.3),  and  its  subsequence 


=  PiP?. 


where  at  steps  fc'  -Pi.pz,  •  EQ  scaled  a  row  or  column  to  the  mean  of  the 
other  row  or  column  sums.  This  sequence  must  be  infinite— because  lastr  and 
lastc  are  set  to  0  after  each  balancing  step-and  the  arguement  for  DEV  can  be 
repeated  to  show  that: 


(3. 19)  for  i.j  =  1,  ,n 
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4.  Test  Results 

We  ran  comparison  tests  of  the  algorithms  described  in  this  paper  and  the 
Sinkhorn/Knopp  method  on  a  collection  of  50  10x10  or  smaller  matrices. 
These  tests  were  run  on  a  VAX  11/780  at  UC,  Berkeley,  with  7  significant  digits  in 
single  precision  and  16  digits  in  double  precision.  Sums  were  accumulated  in 
double  precision. 

For  convergence  to  "tol"  accuracy,  we  required  that  all  row  and  column 
sums  deviate  from  the  mean,  /j. ,  by  less  than  tol  ji.  So,  in  the  normalized 
matrices,  row  and  column  sums  could  not  deviate  from  1  by  more  than  tol. 

The  examples  in  this  section  were  selected  to  illustrate  the  following  points: 

(1)  EQ  exhibited  significantly  better  average  and  worst-case  behavior  than  SK. 
On  our  test  bed,  for  convergence  to  tol  =  10-5,  the  ratio  of  total  SK  opera¬ 
tions  to  total  EQ  operations  varied  from  a  low  of  1/2  to  a  high  of  more  than 
130. 

(2)  We  found  striking  examples  where  EQ  was  significantly  faster  than  DEV.  BAL, 
or  SK.  Since  each  iteration  by  EQ  scaled  a  row  or  column— like  DEV— or  bal¬ 
anced  a  row  and  column  pair— like  BAL— there  is  evidence  that  some 
mechanism  is  at  work  which  enables  EQ  ;o  choose  the  right  operation  th, 
right  time. 

To  facilitate  comparing  DEV,  BAL.  and  EQ,  we  counted  their  "steps"  in  the 
following  way:  each  scaling  of  a  row  or  column  counted  as  1  step,  and  each 
balancing  of  a  row/column  pair  counted  as  two  steps.  In  this  way,  the  operation 
cost  (where  an  operation  is  a  multiplication  or  division)  was  the  same  for  each 
step  of  each  of  the  three  algorithms. 

To  facilitate  comparing  EQ  and  SK  we  computed  the  approximate  ratio  of 
total  operations  performed  by  EQ  to  total  operations  performed  by  SK. 

These  first  four  examples  were  the  test  matrices  in  MARSHALL  and  0LK1X 
[1968]: 


10410z10z 

10z  1  0 

A  = 

10z  1  1 

B  = 

1 0Z 1 03  1 

10z  1  1 

0  10Z10Z  j 

10z10z  0 

j  104  1  0 

C  = 

10z104  1 

D  = 

1 04 1 0a  1 

0  1  10z 

i 

0  104104 
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STEPS  TO  CONVERGENCE  FOR  MATRIX  .4 


TOL 

DEV 

BAL 

EQ 

10'2 

2 

18 

2 

io-a 

2 

20 

2 

10-4 

2 

26 

2 

10~5 

2 

32 

2 

STEPS  TO 

CONVERCENC 

TOL 

DEV 

BAL 

EQ 

10'2 

102 

19 

21 

10”3 

201 

24 

26 

10'4 

302 

34 

37 

1Q-* 

402 

34 

46 

,K  SK  OPERATIONS 
EQ  OPERATIONS 

1  .9 

1  .9 

1  .9 

1  .9 

FOR  MATRIX  B 

„ K  SK  OPERATIONS 
EQ  OPERATIONS 


39 

3.3 

75 

5.3 

113 

5.6 

150 

6.0 
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STEPS  TO  CONVERGENCE  FOR  MATRIX  C 


TOL 

DEV 

BAL 

EQ 

SK 

SK  OPERATIONS 
EQ  OPERATIONS 

I  O'2 

24 

18 

11 

10 

1.7 

IQ"3 

SO 

24 

19 

307 

29.8 

10'4 

1656 

30 

38 

1092 

53.1 

10"5 

3231 

34 

49 

1899 

71.5 

STEPS  TO  CONVERGENCE  FOR  MATRIX  D 

TOL 

DEV 

BAL 

EQ 

SK 

’  SK  OPERATIONS 
{  EQ  OPERATIONS 

10'2 

231 

22 

21 

84 

7.4 

10'3 

1895 

26 

32 

706 

40.7 

10-4 

4891 

32 

34 

1830 

99.4 

10~5 

7961 

40 

40 

2983 

137.7 
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f 


Our  next  examples,  R  and  3,  are  (5  x  5)  matrices.  They  illustrate  the  curi 
ous  fact  mentioned  in  (2)  above. 


R 


100  1  0  0  0 

0  200  1  0  0 

0  0  300  1  0 

0  0  0  400  1 

1  0  0  0  500 


STEPS  TO  CONVERGENCE  FOR  MATRIX  R 


TOL 

DEV 

BAL 

EQ 

SK 

[  sk  operations) 

(  EQ  OPERATIONS] 

10~z 

10 

14 

9 

1 

.4 

10~3 

115 

23 

31 

214 

24.4 

O 

1 

1682 

2474 

51 

830 

43.6 

10~5 

3912 

4036 

63 

1067 

55.4 
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’  40  0  1  1  1  1 

I  80  0  1  1  I 

5  =  1  1  120  0  1 

II  1  160  0 

.011  1  200  J 


STEPS  TO  CONVERGENCE  FOR  MATRIX  5 


TOL 

DEV 

BAL 

EQ 

SK 

SK  OPERATIONS } 

EQ  OPERATIONS ) 

10-2 

39 

20 

16 

15 

3.3  f 

10-3 

206 

100 

29 

53 

6.5 

10-4 

384 

108 

47 

94 

7.1 

10~5 

570 

198 

62 

136 

7.7  : 
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Our  final  five  examples  are  (10  x  10)  upper  Hessenberg  matrices: 

Hx  ~  (/iy)  u;here  h^.  =  j,  ® 

*  "  1  (1  otherwise 

Hz,  H3,  and  Hi%  each  differ  from  Hi  in  a  single  entry: 
the  (1,1)  entry  in  Hz  is  100 
the  (1,2)  entry  in  H3  is  100 
the  (1,3)  entry  in  H4  is  100 

Hs  is  the  result  of  replacing  all  diagonal  entries  in  H t  by  100. 


Here  is  a  summary  of  the  results  for  tol  =  10 ~5: 

STEPS  TO  CONVERGENCE  FOR  MATRICES  Hx.i=  1,  •  ,5 


MATRIX 

DEV 

BAL 

EQ 

SK 

|  SK  OPERATIONS 
(  EQ  OPERATIONS 

Mx 

812 

748 

912 

55 

.6 

Hz 

873 

926 

717 

72 

.8 

H3 

925 

952 

775 

71 

.7 

H< 

953 

943 

921 

71 

.6 

Hs 

14476 

17456 

917 

1004 

8.9 
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Appendix:  Sample  Iteration  matrices  for  EQ  and  SK 


The  following  pages  contain  sample  iteration  matrices  for  EQ  and  SK  when 
applied  to  matrices  C  and  D.  Normalized  iteration  matrices,  DAE.  are  printed 
with  their  row  and  column  sums  and  deviations. 
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